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(54) Method and device for the processing of images based on morphable models 



(57) A method of processing an image of a three- 
dimensional object, comprising the steps of providing a 
morphable object model being derived from a plurality 
of 3D images, matching the morphable object model to 
at least one 2D object image, and providing the matched 
morphable object model as a 3D representation of the 
object. A method of generating a morphable object mod- 



el comprises the steps of generating a 3D database 
comprising a plurality of 3D images of prolotype objects, 
subjecting the data of the 3D database to a data 
processing providing correspondences between the 
prototype objects and at least one reference object, and 
providing the morphable object model as a set of objects 
. comprising linear combinations of the shapes and tex- 
lures of the prototype objects. 
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Description 

[0001] The present invention relates to a method for image processing, in particular to the manipulation (detecting, 
recognizing and/or synthesizing) of images of three-dimensional objects, as e. g. human faces, on the basis of a mor- 
s phable model for image synthesis. Furthermore, the present invention relates to an image processing system for im- 
plementing such a method. 

[0002] One field of image manipulation concerns particulary the manipulation of human faces. Modeling human faces 
has challenged researchers in computer graphics since its beginning. Since the pioneering work of Parke [23, 24, see 
the attached list of references], various techniques have been reported for modeling the geometry of faces [9. 10, 20, 
10 31,1 9] and for animating them [26, 1 3, 17, 29, 20, 35, 27). A detailed overview can be found in the book of Parke and 
Waters [22], 

[0003] The techniques developed for the animation of faces can be roughly separated in those that rely on physical 
modeling of facial muscles [35], and in those applying previously captured facial expressions to a face [23, 2]. These 
performance based animation techniques compute the correspondence between the different facial expressions of a 
15 person by tracking markers glued to the face from image to image. To obtain photo-realistic face animations, a high 
number of markers (e.g. up to 182 markers) have to be used [13]. 

[0004] Computer aided modeling of human faces still requires a great deal of expertise and manual control to avoid 
unrealistic, non-face-like results. Most limitations of automated techniques for face synthesis, face animation or for 
general changes in the appearance of an individual face can be described either as the problem of finding corresponding 

so feature locations in different faces or as the problem of separating realistic faces from faces that could never appear 
in the real world. The correspondence problem is crucial for all morphing techniques, both for the application of motion- 
capture data to pictures or 3D tace models, and for most 3D face reconstruction techniques from images. A limited 
number of labeled feature points marked in one face, e.g., the tip of the nose, the eye comer and less prominent points 
on the cheek, must be located precisely in another face. The number of manually labeled feature points varies from 

ss application to application, but usually ranges from 50 to 300. Only a correct alignment of all those points allows ac- 
ceptable intermediate morphs, a convincing mapping of motion data from the reference to a new model, or the adap- 
tation of a 3D face model to 2D images for Video cloning'. Human knowledge and experience is necessary to com- 
pensate for the variations between individual faces and to guarantee a valid location assignment in the different faces. 
At present, automated matching techniques can be utilized only for very prominent feature points such as the corners 

30 of eyes and moulh. 

[0005] A second type of problem in lace modeling is the separation of natural faces from non faces. For this, human 
knowledge is even more critical. Many applications involve the design of completely new natural looking faces that can 
occur in the real world but which have no 'real' counterpart. Others require the manipulation of an existing face ac- 
cording to changes in age, body weight or simply to emphasize the characteristics of the face. Such tasks usually 
35 require time-consuming manual work combined with the skills of an artist. Further prior art techniques are illustrated 
below. 

[0006] It is the object of the invention to provide improved image processing methods and systems being capable 
to meet the above problems, which particulary process images of three-dimensional objects in a more flexible and 
effective manner. 

40 [0007] This object is solved by a method and a system comprising the features of claim 1, 7, 12, 13 and 14 resp. 
Advantageous embodiments and implementations of the invention are defined in the dependend claims. 
[0008] According to the invention, a parametric face modeling technique is presented that assists in both above 
problems. First, arbitrary human faces can be created simultaneously controlling the likelihood of the generated faces. 
Second, the system is able to compute correspondence between new faces. Exploiting the statistics of a large data 

*s set of 3D face scans (geometric and textural data, Cyberware™) a morphable face model has been buitt which allows 
to recover domain knowledge about face variations by applying pattern classification methods. The morphable face 
model is a multidimensional 3D morphing function that is based on the linear combination of a large number of 3D face 
scans. Computing the average face and the main modes of variation in the dataset, a probability distribution is imposed 
on the morphing function to avoid unlikely faces. Also, parametric descriptions of face attributes such as gender, dis- 

so tinctiveness, •hooked" noses or the weight of a person, have been derived by evaluating the distribution of exemplar 
faces for each attribute within our face space. 

[0009] Having constructed a parametric face model that is able to generate almost any face, the correspondence 
problem turns into a mathematical optimization problem. New faces, images or 3D face scans, can be registered by 
minimizing the difference between the new face and its reconstruction by the face model function. An algorithm has 
ss been developed that adjusts the model parameters automatically for an optimal reconstruction of the target, requiring 
only a minimum of manual initialization. The output of the matching procedure is a high quality 3D face model that is 
in full correspondence with the morphable face model. Consequently all face manipulations parameterized in the model 
function can be mapped to the target face. The prior knowledge about the shape and texture of faces in general that 
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is captured in our model function is sufficient to make reasonable estimates of the full 3D shape and texture of a face 
even when only a single picture is available. When applying the method to several images of a person, the reconstruc- 
tions reach almost the quality of laser scans. 

[0010] The key part of the invention is a generalized model of human faces. Similar to the approach of DeCarlos et 
5 al. [9], the range of allowable faces according to constraints derived from prototypical human faces is restricted. How- 
ever, instead of using a limited set ol measurements and proportions between a set of facial landmarks, the densely 
sampled geometry of the exemplar faces obtained by laser scanning (Cyberware™) are directly used. The dense 
modeling ol facial geometry (several thousand vertices per face) leads directly to a triangulatbn of the surface. Con- 
sequently, there is no need for variational surface interpolation techniques [9, 21 , 30]. The inventors also added a 
to model of texture variations between faces. The morphable 3D face model is a consequent extension of the interpolation 
technique between face geometries, as introduced by Parke [24], Computing correspondence between individual 3D 
face data automatically, the invention enables to increase the number of vertices used in the face representation from 
a few hundreds to tens of thousands. 

[0011] Moreover, a higher number of faces can be used, and thus between hundreds of 'basis' faces rather than just 
is a few can be interpolated, The goal of such an extended morphable face model is to represent any face as a linear 
combination of a limited basis set of face prototypes. Representing the face of an arbitrary person as a linear combi- 
nation (morph) of "prototype" faces was first formulated for image compression in telecommunications [7]. Image-based 
linear 2D face models that exploit large data sets of prototype laces were developed for face recognition and image 
coding [3, 16, 34]. 

20 [0012] Different approaches have been taken to automate the matching step necessary for building up morphable 
models. One class of techniques is based on optical flow algorithms [4, 3] and another on an active model matching 
strategy (1 1 , 1 5]. Combinations of both techniques have been applied to the problem of image matching [33]. According 
to the invention, an extension of this approach to the problem of matching 3D laces has been obtained' 
[001 3] The correspondence problem between different three-dimensional face data has been addressed previously 

25 by Loo et al.[1B]. Thoir shapo-matching algorithm differs significantly from the invention in several respecls.Firet; tho 
correspondence is computed in high resolution, considering shape and texture data simultaneously. Second, instead 
of using a physical tissue model to constrain the range of allowed mesh deformations, the statistics of example faces 
are used to keep deformations plausible. Third, the system of the invention does not rely on routines that are specifically 
designed to detect the features exclusively found in human faces, e.g., Byes, nose. 

30 [0014] The general matching strategy of the invention can be used not.only to adapt the morphabls model to a 3D 
face scan, but also to 2D images of faces. Unlike a previous approach [32], the morphable 3D face model Is now 
directly matched to images, avoiding the detour of generating intermediate 2D morphable image models. As an ad- 
vantageous consequence, head orientation, illumination conditions and other parameters can be free variables subject 
to optimization. It is sufficient to use rough estimates of their values as a starting point of the automated matching 

35 procedure. 

[0015] Most techniques for 'face cloning', the reconstruction of a 3D face model from one or more images, still rely 
on manual assistance for matching a deformable 3D face model to the images [24, 1, 28]. The approach of Pighin et 
al. [26] demonstrates the high realism that can be achieved for the synthesis ol faces and facial expressions from 
photographs where several images of a face are matched to a single 3D face model. The automated matching proc'e- 
40 dure of the invention could be used to replace the manual initialization step, where several corresponding features 
have to be labeled in the presented images. 

[0016] One particular advantage of the invention is that il works directly on faces without manual markers, fn the 
automated approach the number of markers is extended to its limit. It matches the full number of vertices available in 
the face model to images. The resulting dense correspondence fields can even capture changes in wrinkles and map 

45 these from one face to another. ■ 

[0017] The invention teaches a new technique for modeling textured 3D faces. 3D faces can either be generated 
automatically from one or more photographs, or modeled directly through an intuitive user interface. Users are assisted 
in two key problems of computer aidBd face modeling. First, new face images or new 3D face models can be registered 
automatically by computing dense one-to-one correspondence to an internal face model. Second, the approach reg- 

so ulates the naturalness of modeled faces avoiding faces with an "unlikely" appearance. 

[0018] Applications of the invention are in particular in the field of facial modeling, registration, photogrammetry, 
morphing, facial animation, and computer vision. 

[001 9] Further advantages and details of tho invention are described with reference to the attached drawings, which 
show: 

55 

Figure 1 : a schematic representation of basic principles of the invention, 



Figure 2: an illustration of the face synthesis on the basis of the morphable model, 
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Figure 3: an illustration ol the variation of facial attributes ot a'single face, 

Figure 4:' a flow chart illustrating the processing steps for reconstructing 3D shape and texture of a new face from 
a single image,- 

s ( 

Figure 5: . a flow chart of the simultaneous reconstruction of a 3D shape and texture of a new face from two images, 

Figure 6: an illustration of the generation of new images with modified rendering parameters, 

io Figure 7: . an illustration ol the reconstruction of a 3D face of Mona Lisa on the basis of the invention, and 

FigureS: a schematic illustration of an image processing system according to the invention. 

[0020] As illustrated in Figure 1 , starting from an example set of 3D face models, a morphable face model is derived 
is by transforming the shape and texture of the examples into a vector space representation. The morphable face model 
contributes to two main steps in face manipulation: (1 ) deriving a 3D face model from a novel image, and (2) modifying 
shape and texture in a natural way. New faces and expressions can be modeled by forming linear combinations of the 
prototypes. Shape and texture constraints derived from the statistics of our example faces are used to guide manual 
modeling or automated matching algorithms. 3D face reconstructions from single images and their applications for 
so photo-realistic image manipulations can be obtained. Furthermore, face manipulations according to complex param- 
eters such as gender, fullness of a face or its distinctiveness are demonstrated. 

[0021] The further description is structured as follows. It starts with a description (I) of the database of 3D face scans 
from which our morphable model is built. In the following section (II), the concept of the morphable face model is 
introduced, assuming a set of 3D face scans that are in full correspondence. Exploiting the statistics of a dataset, a 

2S parametric description of faces is derived, as well as the range of plausible faces. Additionally, facial attributes from 
the labeled data set to the parameter space of the model are mapped. In section III, a method for matching the flexible 
model of the invention to novel images or 3D scans of faces is described. Along with a 3D reconstruction, the method 
can compute correspondence, based on the morphable model. Section IV describes an iterative method for building 
a morphable model automatically from a raw data set of 3D face scans when no correspondences between the exemplar 

30 . faces are available. Finally, applications of the technique to novel images will be shown. 

[0022] The description of the method according to the invention refers to the attached figures. It is emphasized that 
these figures for printing reasons (in a patent application) are not capable to reflect the high quality of the images 
obtained by the invention. 



[0023] Laser scans (Cyberware™) ot 200 heads of young adults (100 male and 1 00 female) were used for obtaining 
prototypes. The laser scans provide head structure data in a cylindrical representation, with radii r(h, rji) of surface 
points sampled at 51 2 equally-spaced angles Q and at 51 2 equally spaced vertical steps h. Additionally, the RGB-color 
40 values R(h, $), G(h, <(>), and B(h, $), were recorded in the same spatial resolution and were stored in a texture map 
with 8 bit per channel. 

[0024] All faces were without makeup, accessories, and facial hair. The subjects were scanned wearing bathing 
caps, that were removed digitally. Additional automatic pre-processing of the scans, which for most heads required no 
human interaction, consisted of a vertical cut behind the ears, a horizontal cut to remove the shoulders, and a normalr 
ts jzation routine that brought each face to a standard orienlation and position in space. The resultant faces were repre- 
sented by approximately 70,000 vertices and the same number of color values. 

II Morphable 3D Face Model 

so [0025] The morphable model is based on a data set of 3D faces. Morphing between faces requires full correspond- 
ence between all of the faces. In this section, it is assumed that all exemplar faces are in full correspondence. The 
algorithm for computing correspondence will be described in Section IV. 

[0026] We represent the geometry of a face with a 6hape-vector S=(X V Y V Z V X 2 , Y n , Z^f G a 1 ", that contains 
the X, Y, Z-coordinates of its n vertices. For simplicity, we assume that the number of valid texture values in the texture 
55 map is equal to the number of vertices. We therefore represent the texture of a face by a texture-vector T= (/?,, G v 

B,, G„, B n ) T Gfff 3 ; that contains the Ft, G, B color values of the n corresponding vertices. A morphable face 

model was then constructed using a data set of m exemplar faces, each represented by its shape-vector S, and texture- 
vector 7j. Since we assume all faces in full correspondence (see Section IV), new shapes S mwiei and new textures 
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T mode) can be expressed in barycentrio coordinates as a linear combination of the shapes and textures of the mexemplar 
faces: 



icl is! i»l >=1 



ro [0027] We define the morphable model as the set of faces (S mod ( a), r mod (5)}, parameterized by the coefficients a 
= (a v a 2 ...a m ) T and 6= (£>,, .... bJ T . (Standard morphing between two faces (m = 2) is obtained if the parameters 
a,, fc, are varied between 0 and 1 , setting a 2 = 1 - a, and fag = 1 - fy). 

Arbitrary new faces can be generated by varying the parameters a and 5 that control shape and texture. 
[0028] For a useful face synthesis system, it is important to be able to quantify the results in terms of their plausibility 
'5 of being faces. We therefore estimated the probability distribution for the coefficients a-, and b s from our example set of 
faces. This distribution enables us to control the likelihood ol the coefficients a-, and t>, and consequently regulates the 
likelihood of the appearance of the generated faces. 

[0029] We fit a multivariate normal distribution to our data set of 200 faces based on the averages of shape § and 
texture T and the covariance matrices C s and C T computed over the shape and texture differences ASj = S, - S and 
s? ATj= Tj-T. 

[0030] A common technique for data compression known as Principal Componenl Analysis (PCA) J14] performs a 
basis transformation to an orthogonal coordinate system formed by the eigenvectors s, and fj of the covariance matrices 
(in descending order according to their eigenvalues): 



m-l m _l 

5 m ^ c i =5+ Y,a iSi , T mo<itl =T+ (I) 



a, J$ est- 1 . The probability for coefficients a is given by 



with of being the eigenvalues of the shape covariance matrix C e . The probability ~ p[J5) is computed similarly. 
Segmented morphable model: The morphable model described in equation (1), has m - 1 degrees of freedom for 
texture and m - 1 for shape. The expressiveness of the model can be increased by dividing faces into independent 
subregions that are morphed independently, lor example into eyes, nose, mouth and a surrounding region (see Figure 
« 2). 

[0031] According to Figure 2, a single prototype adds a large variety of new faces to the morphable model. The 
deviation of the prototype from the average is added (+) or subtracted (-) from the average. A standard morph (*) is 
located hallway between the average and the prototype. Substracting the differences from the average yields an "anti"- 
face (if). Adding and substracting deviations independently from shape (S) and texture (T) on each of four segments 
so produces a number of distinct faces. 

[0032] Since all faces are assumed to be in correspondence, it is sufficient to define these regions on a reference 
face. This segmentation is equivalent to subdividing the vector space of faces into independent subspaces. A complete 
3D face is generated by computing linear combinations for each segment separately and blending them at the borders 
according to an algorithm proposed for images by |6] . 

ss 

11.1 Facial attributes 

[0033] Shape and texture coefficients a { and fcj in our morphable face model do not correspond to the facial attributes 
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used in human language. While some facial attributes can easily be related to biophysical measurements [12, 9], such 
as the width of the mouth, others such as facial femininity or being more or less bony can hardly be described by 
numbers. In this section, a method for mapping facial attributes is described, defined by a hand-labeled set of example 
faces, to the parameter space of our morphable model. At each position in face space (that is for any possible face), 
we define shape and texture vectors that, when added to or subtracted from a face, will manipulate a specific attribute 
while keeping all other attributes as constant as possible. 

[0034] In this framework, changes in facial expression generated by performance based techniques [23] can be 
transferred by adding the differences between two expressions of the same individual, AS = S expression - S„ BUtra |, A 7= 
Expression " ^neutral- 1o a different individual in a neutral expression. 

[0035] Unlike facial expressions, attributes that are invariant for each individual are more difficult to isolate. The 
following method allows to model facial attributes such as gender, fullness of faces, darkness of eyebrows, double 
chins, and hooked versus concave noses (Figure 3). Figure 3 illustrates the variation of facial attributes of a' single 
face. The appearance ol an original face (with frame) can be changed or substracting shape and texture vectors specific 
to the attributes. 

[0036] Based on a set of faces (Sj, T ( ) with manually assigned labels u. t describing the markedness of the attribute, 
we compute weighted sums 



= £ wCSi-3). AT = £ MTl -f), £ m . 0. (3) 



[0037] Multiples of (AS, AT) can now be added to or subtracted from any individual face. For binary attributes, such 
as gender, setting u.„ = 1 lm A for faces in class A, and u B = -Mm B for those in B, Eq.(3) yields the difference between 
the averages of class A and B. 



[0038] To justify this method, let u(S, T) be the overall function describing the markedness of the attribute in a face 
(S, 7). Since u.(S, .7) is not available per se for all (S, T), the regression problem of estimating p.(S, 7) from a sample 
set of labeled faces has to be solved. The present technique assumes that u(S, 7) is a linear function. .Consequently, ' 
in order to achieve a change Ap. of the attribute, there is only a single optimal direction (AS, AT) for the whole space 
of faces. It can be shown that Equation (3) yields the direction with minimal variance-normalized length 

IIASII 2 M = <AS, C S '\S), IIATI^m = (A7, C^AT). 

[0039] A different kind of facial attribute is its 'distinctiveness", which is commonly manipulated in caricatures. The 
automated production of caricatures has been possible for many years [5]. This technique can easily be extended from • 
2D images to the present morphable face model. Individual faces are caricatured by increasing their distance from the 
average face. In our representation, shape and texture coefficients «j ft are multiplied by a constant factor or different 
factors. 

Ill Matching a morphable model to images 

[0040] An aspect of the invention is an algorithm for automatically matching the morphable face.model to one or 
more images. Providing an estimate of the face's 3D structure (Figure 4), it closes the gap between the specific ma- 
nipulations described in Section 11.1 , the typo of data available in typical applications. 

[0041 ] The processing step for reconstructing 3D shape and texture of a new face from a.single image are illustrated 
in the flow chart of Figure 4. After a rough mannual alignment of the averaged 3D head (top row) the automated 
matching procedure fits the 3D morphable model to the image (center row), In the right column, the model is rendered 
on top of the input image. Details in texture can be improved by illumination-corrected texture extraction from the input 
(bottom row). This correction comprises a back-projection of the generated image to the input image with an illumination 
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correction. The color information from the original image is used for correcting the generated image. This illumination- 
correction by back-protection represents an important and advantageous feature of the invention. 
[0042] Coefficients of the 3D model are optimized along with a set of rendering parameters such that they produce 
an image as close as possible to the input image. In an analysis-by-synthesis loop, the algorithm creates a texture 
mapped 3D face from the current model parameters, renders an image, and updates the parameters according to the 
residual difference. It starts with the average head and wilh rendering parameters roughly estimated by the user. 
[0043] Model Parameters: Facial shape and texture are defined by coefficients Oj and fy, j= 1 , .., m - 1 (Equation 
1). Rendering parameters p depend on the application and contain camera position (azimuth and elevation), object 
scale, image plane rotation and translation , intensity \ rMmb _ ig, amt . k.arrb of ambient light, and/or intensity i rMln i g d!p i bJjir 
of directed light. In order to handle photographs taken under a wide variety of conditions, P also includes color contrast 
as well as offset and gain in the red, green, and blue channel. Other parameters, such as camera distance, light 
direction, and surface shininess, remain fixed to the values estimated by the user or with an appropriate algorithm. 
[0044] From parameters (2, |5, p), colored images 

1 mcdei (* y) - ff fJ- dA j* WA y)> y» T w 

are rendered using perspective projection and the Phong illumination model. The reconstructed image is supposed to 
be closest to the input image in terms of Euclidean distance 

[0045] Matching a 3D surface to a given image is an ill-posed problem. Along with the desired solution, many non- 
face-like surfaces lead to the same image. It is therefore essential to impose constraints on the set of solutions. It is 
an essontial advantagoof the invention that in the present morphable model, shape and toxturo vectors are restricted 
to the vector space spanned by the database. Accordingly, non-face-like surfaces can be completely avoided. 
[0046] Within the vector space of faces, solutions can be further restricted by a tradeoff between matching quality 
and prior probabilities, using P(£), P({5) from Section 3 and an ad-hoc estimate of P( p). In terms of Bayes decision 
theory.Jhe problem is to find the set of parameters (a , j?,p) with maximum posterior probability, given an image \ input 
While a . ft, and rendering parameters p completely determine the predicted image 1^^, the observed image \ t , 
may vary due to noise. For Gaussian noise with a standard deviation the likelihood to observe l hput is p(l inpu )i o. , 
$ ,p ) ~ exp (-V2o 2 N ■ Cj. Maximum posterior probability is then achieved by minimizing the cost function 



[0047] The optimization algorithm described below uses an estimate of E based on a random selection of surface 
points. Predicted color values l^^are easiest to evaluate in the centers of triangles. In the center of triangle k, texture 
(R k , S k , B^and 3D location (3< k , Y k . 2 k ) r are averages of the values at the comers. Perspective projection maps 
these points to image locations (p„ k p y k ) T . Surface normals of each triangle k are determined by the 3D locations 
of the corners. According to Phong illumination, the color components i nmodB t, l g , mo dei ar >d 'b, n »de/ take tne form 

'MM = tfr-amb + W ' K'» S k + U S ' W* (6) 

where I is the direction of illumination, v k the normalized difference of camera position and the position of the triangle's 
center, and t l( = 2(n\)n - 1 the direction of the reflected ray. s denotes surface shininess, and v controls the angular 
distribution of the specular reflection. Equation (6) reduces to l rtmod9 i fk = i r ^ m tPk if a shadow is cast on the center of 
the triangle, which is tested in a method described below. 

[0048] For high resolution 3D meshes, variations in l^^across each triangle fcs{1 ,...,«,} are small, so ^may be 
approximated by 
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where is the image area covered by triangle k. If the triangle is occluded. a k - 0. 

[0049] In gradient descent, contributions from different triangles of the mesh would be redundant. In each iteration, ■ 
we therefore select a random subset Kc {1,...n,} of 40 triangles fcand replace E,by 



« 

[0050] The probability of selecting k is p[k c K) - a*. This method ol stochastic gradient descent [15] is not only 
. more efficient computationally, but also helps to avoid local minima by adding noise to the gradient estimate. 
[0051] Before the first iteration, and once every 1000 steps, the method computes the lull 3D shape of the current 
so model, and 2D positions (p^ p y ) T o\ all vertices. It then delermines a h and detects hidden surfaces and cast shadows 
in a two-pass z-buffer technique. We assume that occlusions and cast shadows are constant during each subset of 
iterations. 

[0052] Parameters are updated depending on analytical derivatives of the cost lunction £, using a,—* a ; - - Xj • dBdaj, 
and similarly for and fiy and pj. with suitable factors Xj. 
2S [0053] Derivatives of texture and shape (Equation 1) yield derivatives of 2D locations (p x k p yk ) T , surface normals 
n„, vectors v A and r^and \ modelik (Equation 6) using chain rule. From Equation (7), partial derivatives dErfdBj, dEx/dfy 
and dE/Bpycan be obtained, 

[0054] Coarse-to-Ffne: In order to avoid local minima, the algorithm follows a coarse-to-fine strategy in several 
respects: 

30 

a) The first set of iterations is performed on a down-sampled version of the input, image with a low resolution 
morphable model. 

b) We start by optimizing only the first coefficients ot; and (J,- controlling the first principal components, along with 
35 all parameters py. In subsequent iterations, more and more principal components are added. 

c) Starting with a relatively strong weight on prior probability in equation (5), which ties the optimum towards the 
prior expectation value, we reduce this weight (or equivalently a N ) to obtain maximum matching quality. 

40 d) In the last iterations, the face model is broken down into segments (Section II). With parameters p ; - fixed, coef- 

ficients a ; -and are optimized independently for each segment. This increased number of degrees of Ireedom 
■ significantly improves facial details. 

[0055] Multiple Images: It is straightforward to extend this technique to the case where several images of a person 
is are available (Figure 5). Figure 5 illustrates a simufianeous reconstruction of 3D shape and texture of a new face from 

two images taken under different conditions in the center row, the 3D face Is rendered on top of the input images. 

Figure 5 demonstrates an essential advantage of the invention. The image processing method can be implemented 

with one or more input images. There are no restrictions with regard to the imaging conditions of the input images. 

This is a particular difference against the 3D reconstruction on the basis of image pairs being taken with a parallax 
so (pseudo-stereo images). 
, [0056] While shape and texture are still descrtoed by a common set of a,- and Py, there is now a separate set of p-yfor 

each input image. E| is replaced by a sum of image distances for each pair of input and model images, and all parameters 

are optimized simultaneously. 

[0057] Illumination-Corrected Texture Extraction: Specific features of individual faces that are not captured by 
ss . the morphable model, such as blemishes, are extracted from the image in a subsequent texture adaptation process. 
Extracting texture from images is a technique widely used in constructing 3D models from images (e.g. [26]). However, 
in order to be able to change pose and illumination, it is important 1o separate pure albedo at any given point from the 
influence of shading and cast shadows in the image. In the inventive approach, this can be achieved because the 
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matching procedure provides an estimate of 3D shape, pose, and illumination conditions. Subsequent to matching, 
we compare the prediction /^.. lor each vertex / with \ inptlt (p xJ , p y J), and compute the minimum change in texture (fy 
Gj, BH that accounts for the difference. In areas occluded in the image, we rely on the prediction made by the model. 
Data from multiple images can be blended using methods similar to [26], 

■I I.I Matching a morphable model to 3D scans 

[0058] The method described above can also be applied to register new 3D faces. Analogous to images, where 
perspective projection P: R 3 -* R 2 and an illumination model define a colored image \(x, y) = (R(x, y), G(x, y), B(x, 
y)) r , laser scans provide a two-dimensional cylindrical parameterization of the surface by means of a mapping G : R 3 
-»fl2 (x,y.z)-4(M). 
. Hence, a scan can be represented as 

\fh^)=RM).G(hA).B(hMrm))) T . (8) 

[0059] In a face (S,T), defined by shape and texture coefficients ay and (Equation 1), vertex /with texture values 
(Rj, G;, BJand cylindrical coordinates (r,, h,, is mapped to l^^/n,, <W = (R P G, B- r //", 
[0060] The matching algorithm from the previous section now determines a ; - and p ; minimizing 



IV Building a morphable model 

[0061] In this section, it is described how to build the morphable model from a set of unregistered 3D prototypes, 
and to add a new face to the existing morphable model, increasing its dimensionality. . 

[0062] The key problem is to compute a dense point-to-point correspondence between the vertices of the faces. 
Since the method described in Section 111. 1 finds the best match of a given face only within the range of the morphable 
model, it cannot add new dimensions to the vector space of faces. To determine residual deviations between a novel 
face and the best match within the model, as well as to set unregistered prototypes in correspondence, we use an 
optic flow algorithm that computes correspondence between two faces without the. need of a morphable model [32]. 
The following section summarizes the technique as adapted to the invention. 

IV.1 3D Correspondence using Optical Row 

[0063] Initially designed to find corresponding points in grey-level images l(x, y), a gradient-based optic flow algorithm 
is modified to establish correspondence between a pair of 3D scans i(h, (Equation 6), taking into account color and 
radius values simultaneously. The algorithm computes a flow field (5h(h,4>),5<|>(h,(|>)) that minimizes differences of III, 
(h$) - l 2 fh + bh(Q+ in a norm that weights variations in texture and shape. Surface properties from differential 
geometry, such as mean curvature, may be used as additional components in \(h$). 

[0064] On facial regions with little structure in texture and shape, such as forehead and cheeks, the results of the 
optical flow algorithm are sometimes spurious. We therefore perform a smooth interpolation based on simulated re- 
laxation of a system of flow vectors that are coupled with their neighbors. The quadratic coupling potential is equal for 
all flow vectors. On high-contrast areas, components of flow vectors orthogonal to edges are bound to the result of the 
previous optic flow computation. The system is otherwise free to take on a smooth minimum-energy arrangement. 
Unlike simple filtering routines, our technique fully retains matching quality wherever the flow field is reliable. Optical 
flow and smooth interpolation are computed on several consecutive levels of resolution. 

[0065] Constructing a morphable face model from a set ol unregistered 3D scans requires the compulation of the 
flow fields between each face and an arbitrary reference face. Given a definition of shape and texture vectors S mf and 
T ret for the reference face, S and T for each face in the database can be obtained by means of the point-to-point 
correspondence provided by (dh(h,Q)M(hA)). 
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IV.2 Further improving the model 

[0066] Because the optic flow algorithm does not incorporate any con-straints on the set ot solutions, it could fail on 
some of the more unusual faces in the database. Therefore, we modified a bootstrapping algorithm to ileratively improve 

s correspondence, on the basis of a method that has been used previously to build linear image models [33]. 

[0067] The basic recursive step: Suppose that an existing morphable model is not powerful enough to match a new 
face and thereby find correspondence with it. The idea is first to find rough correspondences to the novel face using 
the (inadequate) morphable model and then to improve these correspondences by using an optical flow algorithm. 
[0068] Starting from an arbitrary face as the temporary reference, preliminary correspondence between all other 

10 faces and this reference is computed using the optic flow- algorithm. On the basis of these correspondences, shape 
and the texture vectors S and Tcan be computed. Their average serves as a new reference face. The first morphable 
model is then formed by the most significant components as provided by a standard PCA decomposition. The current 
morphable model is now matched to each ol the 3D faces according to the method described in Section 111.1. Then, 
the optic flow algorithm computes correspondence between the 3D face and the approximation provided by the mor- 

is phable model. Combined with the correspondence implied by the matched model, this defines a new correspondence 
between the reference face and the example. 

[0069] Iterating this procedure with increasing expressive power of the model (by increasing the number of principal 
components) leads to reliable correspondences between the reference face and the examples, and finally toa complete 
morphable tace model. 

so 

V Image Processing System 

[0070] One embodiment of a basic configuration of an image processing system according to the invention is sche- 
matically illustrated in Figure B. The image processing system 10 contains a 3D database 20, a model processor 30, 
ss a 2D input circuit 40, an object analyzer 50, a back-projection circuit 60, a modeler circuit 70 and a 3D output circuit 
80 being connected with a computer graphic rendering engine SOa and/or a CAD system 80b. Further details of an 
image processing system are not shown which as such are known (e.g. controlling means, keyboard input means, 
display means and the like). 

[0071] The 3D database 20 contains the structure data ol a plurality of objects (e.g. human faces) being obtained 
30 from a suitable optical object detection, e.g. on the basis of laser scans. The 3D database 20 is connected with the 
model processor 30 which is adapted to perform the data processing steps on the basis of the methods outlined above. 
. As a result, the model processor 30 delivers in particular an average face (e.g. like in figure 4, top row, right) to the 
object analyzer 50 as well as reference data to the modeler circuit 70. The 2D input circuit 40 is adapted to receive 
one or more input images in an appropriate format, e.g. photographs, synthesized images or the like. The 2D input 
35 circuit 40 is connected with the object analyzer 50 matching the morphable model received from the model processor 
30 the input image(s). As a result, the object analyzer 50 generates a 3D model of the input image which is delivered 
to the back-projection circuit 60 or directly to the modeler circuit 70 or to the 3D output circuit 60. On the basis of the 
3D model received from the object analyzer 50 and the original color data received from the 2D input circuit 40, the 
back-projection circuit 60 performs a model correction as outlined above. The corrected model is delivered to the 
40 modeler circuit 70 or directly to the 3D output circuit 80. Finally, the modeler circuit 70 is adapted to introduce amended 
facial features to the (corrected) 3D model using the input of the model processor 30 as outlined above. 

VI Results and modifications 

45 [0072] According to the invention a morphable face model has been built by automatically establishing correspond- 
ence between all of e.g. 200 exemplar faces. The interactive face modeling system enables human users to create 
new characters and to modify tacial attributes by varying the model coefficients. Themodifying facial attributes comprise 
e.g. gaining or loosing weight, frowning or smiling or even "being forced to smile". Within the constraints imposed by 
prior probability, there is a large variability of possible faces, and all linear combinations of the exemplar faces look 

so natural. 

[0073] The expressive power of the morphable model has been tested by automatically reconstructing 3D faces from 
photographs of arbitrary Caucasian faces of middle age that were not in the database. The images were either taken 
by us using a digital camera (Figures 4, 5), or taken under arbitrary unknown conditions (Figure 6). 
[0074] In all examples, we matched a morphable model built from the first 1 00 shape and the first 1 00 texture principal 
55 components that were derived from the whole dataset of 200 faces. Each component was additionally segmented in 
4 parts (see Figure 2). The whole matching procedure was performed in 10 s iterations. On an SGI R10000 processor, 
computation time was 50 minutes. 

[0075] Reconstructing the true 3D shape and lexlure of a face from a single image is an ill-posed problem. However, 
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to human observers who also know only the input image, the results obtained with our method look correct. When 
compared with a real image of the rotated face, differences usually become only visible for large rotations of more than 
60°. 

[0076] There is a wide variety of applications for 3D face reconstruction from 2D images. As demonstrated in Figure 
s 6 the results can be used for automatic post-processing a face wilhin the original picture or movie sequence. 

[0077] Knowing the 3D shape of a face in an image provides a segmentation of the image into face area and back- 
ground. The face can be combined with other 3D graphic objects, such as glasses or hats, and then be rendered in 
front of the background, computing cast shadows or new illumination conditions (Fig. 6). Furthermore, we can change 
the appearance of the face by adding or subtracting specific attributes. If previously unseen backgrounds become 
10 visible, the holes can be filled with neighboring background pixels. 

[0078] We also applied the method to paintings such as Leonardo's Mona Lisa (Figure 7). Figure 7 illustrate a re- 
constructed 3D face of Mona Lisa (top center and right). For modifying the illumination, color differences (bottom left) 
are computed on the 3D face, and then added to the painting (bottom center). Additional warping generated new 
orientations (bottom right). Illumination-corrected texture extraction, however, is difficult here, due to unusual (maybe 
is unrealistic) lighting. We therefore apply a different method for transferring all details of the painting to novel views. For 
new illumination (Figure 7, bottom center), we render two images of the reconstructed 3D face with different illumination, 
and add differences in pixel values (Figure 7, bottom left) to the painting. For a new pose (bottom right), differences in 
shading are transferred in a similar way, and the painting is then warped according to the 2D projections of 3D vertex 
displacements of the reconstructed shape. 
so [0079] According to the invention the basic components for a fully automated lace modeling system based on prior 
knowledge about the possible appearances of faces are presented. Further extensions are contemplated under the 
following aspects: 

[0080] Issues of implementation: We plan to speed up our matching method by implementing a simplified Newton- 
method for minimizing the cost function (Equation 5). Instead of the time consuming computation of derivatives for 
25 each iteration step, a global mapping of the matching error into the parameter spaco can be used [8]. 

[0081] Data reduction applied to shape and texture data will reduce redundancy of our representation, saving addi- 
' tional computation time. 

[0082] Extending the database: While the current database is sufficient to model Caucasian faces of middle age, 
we would like to extend it to children, to elderly people as well as to other races. 
30 ■ [0083] We also plan to incorporate additional 3D face examples representing the time course of facial expressions 
and visemes, the face variations during speech. 

[0084] The laser scanning technology is to be extended to the collection of dynamical 3D face data. The development 
of fast optical 3D digitizers [25] will allow us to apply the method to streams of 3D data during speech and facial 
expressions. 

35 [0085] Extending the face model: The current morphable model for human faces is restricted to the face area, 
because a sufficient 3D model of hair cannot be obtained with our laser scanner. For animation, the missing part of 
the head can be automatically replaced by a standard hair style or a hat, or by hair that is modeled using interactive 
manual segmentation and adaptation to a 3D model [26, 25]. Automated reconstruction of hair styles from images is 
one of the future challenges. 

40 [0086] The invention can be used with advantage in the field of image recognition. From a matched face model, the 
coefficients are used as a coding of the respective face. An image to be investigated is identified as this face if the 
coefficients corresponding to the image are identical or similar to the coding coefficients of the model. . 
[0087] Further applications of the invention are given in ihe field of modelling images of three-dimensional objects 
other than human faces. These objects comprise e.g. complete human bodies, bodies or faces from animals, technical 

45 objects (as cars, furniture) and the like. 
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Claims 

1. Method of processing an image of a three-dimensional object, comprising the steps of 

- providing a morphable object model being derived from a plurality of 3D images, 

- matching the morphable object model to at least one 2D object image, and 

- providing the matched morphable object model as a 3D representation of the object. 

2. Method according to claim 1 , wherein the matched object model is subjected to a back-projection to color data of 
the 2D input image of ihe object. 

3. Method according to claim 2, wherein the back-projection yields an illumination correction for obtaining color data 
of the surface ol the object. 

4. Method according to claim 3, wherein the color corrected dataare subjected to an adapationto changed illumination 
conditions. 

5. Method according to one of the foregoing claims, wherein the matched morphable object model is subjected to a 
modelling step for modifying at least one object feature. 

6. Method according to one of the foregoing claims, wherein the objects comprise human faces, animal faces, human 
bodies, animal bodies or technical objects. 

7. Method of generating a morphable object model, comprising the steps of 

- generating a 3D database comprising a plurality of 3D images of prototype objects, 

- subjecting the data of the 3D database to a data processing providing correspondences between the prototype 
objects and at least one reference object, and 

- providing the morphable object model as a set of objects comprising linear combinations of the shapes and 
textures of the prototype objects. 

8. Method according to claim 7, wherein the reference object is represented by average object data. 

9. Method according to claim 7 or 8, wherein the set of objects is parameterized with the coefficients of the linear 
combinations and a probability distribution of the coefficients is estimated. 

10. Method according to one of the claims 7 to 9, wherein the morphable object model is generated for a segment of 
the object. 

11. Method according to one of the claims 7 to 1 0, wherein the objects are human faces, animal faces, human bodies, 
animal bodies or technical objects. 

12. Method of recognizing an object, wherein a 3D model of the object to be recognized is processed with a method 
according to one ol the foregoing claims. 

1 3. Method of synthesizing a 3D model of a face with certain facial attributes with the method according to one of the 
claims 1 to 11. 

14. Image processing system (10) for processing 3D images, comprising a 3D database (20), a model processor (30), 
a 2D input circuit (40), an object analyzer (50), and a3D output circuit (80). 

15. System according to claim 14, which further comprises a back-protection circuit (60) and/or a modeler circuit (70). 

16. System according to claim 14 or 15, wherein the 3D output circuit (80) comprises a computer graphic rendering 
engine and/or a CAD processor. 
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